Unsupervised methods for learning distributed representations of words areubiquitous in today's NLP research, but far less is known about the best waysto learn distributed phrase or sentence representations from unlabelled data.This paper is a systematic comparison of models that learn suchrepresentations. We find that the optimal approach depends critically on theintended application. Deeper, more complex models are preferable forrepresentations to be used in supervised systems, but shallow log-linear modelswork best for building representation spaces that can be decoded with simplespatial distance metrics. We also propose two new unsupervisedrepresentation-learning objectives designed to optimise the trade-off betweentraining time, domain portability and performance.
展开▼